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Abstract 

Automatic annotation of temporal expressions is a re- 
search challenge of great interest in the field of in- 
formation extraction. In this report, I describe a 
novel rule-based architecture, built on top of a pre- 
existing system, which is able to normalise temporal 
expressions detected in English texts. Gold standard 
temporally-annotated resources are limited in size and 
this makes research difficult. The proposed system 
outperforms the state-of-the-art systems with respect 
to TempEval-2 Shared Task (value attribute) and 
achieves substantially better results with respect to 
the pre-existing system on top of which it has been 
developed. I will also introduce a new free corpus 
consisting of 2822 unique annotated temporal expres- 
sions. Both the corpus and the system are freely avail- 
able on- lincQ. 

Keywords: information extraction, temporal ex- 
pression, text mining, natural language processing 

1 Introduction 

In many domains, the possibility of using and inter- 
preting temporal aspects and events is important in 
order to organise information. Temporal knowledge 
allows people to filter information and even infer tem- 
poral flows of events. Furthermore, it permits an im- 
proving of intelligence for question answering, infor- 
mation retrieval and information filtering systems. 

A temporal expression [7], also called timex, refers 
to every natural language phrase that denotes a tem- 
poral entity such as an interval or an instant. For 
example, in a sentence like "Italian prime minister 
Mario Monti said yesterday that the reform has been 
very successful." the phrase "yesterday" is actually 
a temporal expression. Timexes elicit a binding be- 
tween the natural language domain and the time do- 
main because it is always possible to represent such 



expressions as a time point, interval or set using ISO 
8601 standard^ Temporal expressions could be of 
three different types pQ: fully-qualified, deictic and 
anaphoric. 

Fully-qualified A temporal expression is fully- 
qualified with respect to the binding when all the 
information required to infer a point in the time 
domain are fully included inside the expression. 
In this category the following expressions falls: 
March 15 2001, 21st July 1985 or 31/04/2011. 
Fully-qualified expressions are the easiest to de- 
tect because of their rigid lexical form. 

Deictic In this case, inferring the binding with the 
time domain necessarily requires to take into ac- 
count the time of utterance (when the document 
has been written or when the speech has been 
given). Deictic expressions could not be properly 
associated to a precise time without that infor- 
mation. Typical deictic temporal expressions are: 
today, yesterday, last Sunday and two months 
ago. 

Anaphoric These expressions can be mapped to a 
precise point in the time domain only taking into 
account temporal expressions previously men- 
tioned in the text or during the speech. Ex- 
amples of this category are: March 15, the next 
week, Saturday. The only difference between de- 
ictic and anaphoric expressions is the location 
of the temporal reference: for deictic expres- 
sions it is the time of utterance or publication, 
for anaphoric expressions it is a time previously 
evoked in the text or speech. Anaphoric expres- 
sions constitute a future challenge for the scien- 
tific research in this field. 

For the sake of completeness, another kind of cate- 
gorisation [T3] is also adopted in the field. It identifies 



1 http://www.es. man. ac.uk/~filannim/ 



2 http://www.w3.org/TR7NOTE-datetime 
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the possible shapes of timexes with respect to their se- 
mantics and admits the following types: time or date 
references, time references that anchor on another 
time, durations, recurring times, context-dependent 
times, vague references and times indicated by an 
event. 

In my taster project I focussed on the normalisation 
of fully-qualified and deictic temporal expressions. I 
will not use the last categorisation because of the 
fuzziness of boundaries among types. 

2 Background 

The idea of annotating temporal expressions automat- 
ically from texts appeared for the first time in 1998 
[3]. This topic aroused an increasing interest with 
the proposal of a proper temporal annotation scheme 
[12] . The original aim was to make the annotation 
phase easier with respect to the previous scheme in 
order to collect annotated data and use the tempo- 
ral information to enhance performances of question 
answering systems |12) . All the most recent systems 
[T9l 12"0] proposed for the temporal expressions extrac- 
tion task go through two different steps: identification 
and normalisation. This dichotomy has become uni- 
versally accepted by the research community because 
it makes the extraction phase easier to approach [TJ. 

In the identification phase the effort is concentrated 
on how to detect properly the sub-expressions that are 
real temporal expressions in natural language texts. 
This step is usually done by using machine learning 
techniques. Ahn et al. [1] firstly used Conditional 
Random Fields [10] showing better performances with 
respect to a previous work [2] in which they used 
Support Vector Machines [4]. Poveda et al. [TJ] in- 
troduced a sophisticated Bootstrapping technique en- 
hancing the recognition of temporal expressions while 
Mani et al. [12] used rules learned by a decision 
tree classifier, C4.5 [___], and Ling and Weld [TTJ tried 
Markov Logic Network [BJ. 

The second step is the normalisation. In this phase 
the main goal is to interpret the expression, extract 
the temporal information and represent it in a proper 
pre-defined format. The universally accepted stan- 
dard for temporal expressions annotation is TimeML 
[14] . It provides a specification for the representation 
of temporal expressions and also events. In this work 
the normalisation task aims at producing the proper 
TimeML code that correctly represents the temporal 
information (see Figure [TJ . This step is usually ac- 
complished using rule-based approaches. Grover et al. 
[9] used a rule-based approach on top of a pre-existing 
information extraction system, whereas Strotgen and 
Gertz [TJ] produces a small set of hand-crafted rules 



with an ad-hoc selection algorithm. UzZaman and 
Allen [T7] produced a rule-based normaliser focussing 
just on type and value attributes of TIMEX3 tag 
(the one used to represent timexes). 

3 Method 

The contribution of my short taster project is twofold. 
Firstly, I will illustrate a temporal expression corpus 
explicitly designed for the normalisation phase. Then 
I will describe the software architecture of a new nor- 
maliser built on top of a pre-existing one. 

3.1 Temporal expressions corpus 

Gold-standard temporally-annotated resources are 
very limited in general domain 5 , and even less in 
specific ones like medical, clinical and biological [5]. 
In the last decade, different sources of annotated tem- 
poral expressions have been developed. Because of 
the rapid evolution of this research field, usually the 
sources differ even with respect to the annotation 
guidelines. This leads to the existence of different 
corpora not entirely compatible to each other. 

The main difference among them consists in the 
tag used to annotate temporal expressions: TIMEX2 
against TIMEX3. These two tags reflect totally differ- 
ent way of annotating the same temporal expressions 
leading to the impossibility of using both corpora at 
the same time. 

I created a corpus of temporal expressions col- 
lecting all TIMEX3 tags in four different corpora: 
AQUAINTlI TimeBank 1.2_| WikiWars^l and TRIOS 
TimeBank v0.1__|. I extracted from each document 
all the possible temporal expressions and for each 
one I also saved the related document creation time, 
the type (DATE, TIME, SET or DURATION) and 
the normalisation provided by the human annotators. 
Then I compacted the corpus removing possible dupli- 
cates. With the expression duplicates I refer to com- 
pletely identical tuples, i.e. same text, same normali- 
sation, same utterance time and same type. 

I obtained a corpus of 2822 unique annotated tem- 
poral expressions. The Table __] shows an excerpt of 
the corpus. Further information about the distribu- 
tion of temporal expression types in it is provided in 
Table [TJ 

The corpus is freely available in CSV format using 
a tabulation character as delimiter. 

3 http://www.ldc.upenn.edu/Catalog/docs/LDC2002T31/ 
4 http: / / www.timexportal.info / corpora-timebankl2 
5 http: / / www.timexportal.info / wikiwars 
6 http: / / www.cs.rochester.edu/u / naushad / trios-timcbank- 
corpus 

7 http: / /www. cs.man.ac.uk/~filannim/timex3s_corpus. csv 
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<?xml version="1.0" ?> 

<TimeML xmlns : xsi="http : //www. w3 . org/2001/XMLSchema- instance" 

xsi :noNamespaceSchemaLocation="http : //timeml . org/timeMLdocs/TimeML_l .2.1 .xsd"> 
<DOCID>Example_document</DOCID> 
<DCT>2012, Manchester, Apr 17, 2012</DCT> 
<TITLE>Example document</TITLE> 
<TEXT> 

Italian prime minister Mario Monti 

<EVENT eid="el" class="OCCURRENCE">said</EVENT> 

<MAKEINSTANCE eiid="eil" eventID="el" pos="VERB" tense="PAST" aspect="NDNE" /> 
<TIMEX3 tid="tl" type="DATE" value="2012-04-16">yesterday</TIMEX3> 
that the reform has been very successful. 

<TLINK eventInstanceID="eil" relatedToTime="tl" relType= "DURING" /> 
</TEXT> 
</TimeML> 



Figure 1: Example of TimcML code. In the sentence there is a deictic temporal expression; "yesterday" can be 
correctly annotated only taking into account the document creation time (DCT). 



Temporal expression 


Type 


Value 


Utterance 


more than two years 


DURATION 


P2Y 


20110926 


much of 2010 


DATE 


FUTURE _REF 


20110926 


nearly a month 


DATE 


P1M 


20110926 


nearly an hour 


DURATION 


PT1H 


19910225 


nearly forty years 


DURATION 


P40Y 


1919980120 


nearly four years ago 


DATE 


1994 


19980227:081300 


nearly three years 


DURATION 


P3Y 


19891030 


nearly two months 


DURATION 


P2M 


19980306:131900 


nearly two months afterwards 


DATE 


FUTURE _REF 


20110926 


nearly two weeks ago 


DATE 


1989- WXX 


19891030 


nearly two years 


DURATION 


P2Y 


19980301:141100 


next day 


DATE 


2011-09-27 


20110926 



Table 2: Brief excerpt of the corpus. 
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Timex type 


Frequency 


DATE 


2307 


DURATION 


416 


TIME 


71 


SET 


28 


TOTAL 


2822 



Table 1: Distribution of TIMEX3 tags in the corpus. 

3.2 Temporal expressions normaliser 

I built a new normaliser on top of the one freely 
available from University of RochesteJl: TRIOS. It 
is a rule-based normaliser and it has been proved to 
provide the second best performance in TempEval-2 
Shared Task [T7|. All the rules are in the form of 
regular expressions in a switch architecture: the ac- 
tivation of one of them excludes the activation of all 
the others. 

I introduced a top layer with three new kinds of 
rules: extension, manipulation and post-manipulation 
rules. 

The extension rules are just new rules that cover 
non-expected cases and are checked immediately be- 
fore the pre-existing rules. If a temporal expression 
do not activate any of the extension rule, it goes into 
TRIOS. For example, some of these rules are used 
to normalise expressions of festivities dates such as 
"Thanksgiving day" or "Saint Patrick's day". 

The manipulation rules have been introduced to 
turn particular well-known expressions into an easier 
form before TRIOS processes them. Once one of these 
rules is activated, the original temporal expression is 
transformed into a reduced one that is easier to nor- 
malise properly for the pre-existing set of rules. After 
the transformation, the new temporal expression is 
taken in input by TRIOS for the normalisation task. 

Lastly, I used the post-manipulation rules to solve 
some deficiencies in the normaliser by adding further 
information lost by TRIOS and finally improving the 
performance. In this case the temporal expression is 
evaluated through the extension rules or the original 
set. At the end of the normalisation process the result 
is enriched with further information. For example, 
I used these rules to add information about seasons 
which are not considered in TRIOS at all. 

In the end, I introduced 32 new regular expression 
patterns: 16 extension rules, 12 manipulation rules 
and 4 post-manipulation rules. The entire system is 
freely available onlin^l under GNU licence*"! 

8 http : / / www.es . rochester.edu /u/ naushad /temporal 
9 http: / /www. cs.man.ac.uk/~filannim/timex_normaliser. zip 
10 http: / /www. gnu.org/licenses/gpl. html 



4 Evaluation 

I evaluated the normalisation system using the new 
corpus previously described as a training set and 
then I measured the performances with respect to the 
TempEval-2 Shared Task test set. This offered me 
the possibility of comparing my normaliser with all 
the others evaluated in that challenge. 

In order to measure the difference between TRIOS 
and my extension I also tested both of them by using 
the new corpus. It is important to notice that TRIOS 
has been trained on the same data provided in the 
new corpus. For this reason a comparison between 
these systems is legitimate. 

In both cases, the evaluation procedure is based on 
counting. Because the normalisation task is aimed 
at providing the right type attribute and the right 
VALUE attribute, the evaluation is carried out by 
counting how many times the system provides the 
same value with respect to the human ones. It is im- 
portant to emphasise that every value provided by the 
system that differs form the human one for at least 
one character is considered error. 

If this method is quite reasonable for type at- 
tribute, it might be too restrictive for value at- 
tribute. Some practical examples could be of help 
to explain the problem. 

• The human annotation of a certain timex 
is {type: "DATE", value: "FUTURE_REF"} 
whereas the system provides a the more spe- 
cific annotation {type: "DATE", value: "2013- 
09-XX"}. 

• The system provides an annotation that is less 
specific than that provided by humans. For ex- 
ample, it happens when the human-annotation is 
{type: "DATE", value: "2011-04-18"} and the 
system provide {type: "DATE", value: "2011- 
04-XX"}. 

In all these cases the annotations are considered com- 
pletely wrong. Even when the system provides a 
partially wrong annotation, e.g. {type: "DATE", 
value: "2011-04-23"} for a human annotation of 
{type: "DATE", value: "2011-04-18"}, considering 
it a complete wrong result may be too strict because 
year and month are correct however. This fact has 
justified the investigation of other measurement met- 
rics p2]. 

4.1 Results 

The normalisation results with respect to TempEval-2 
Shared Task are shown in Table 03 The new TRIOS 
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type value 



sentence in which each temporal expression is located. 



Edinburgh 


0.84 


0.63 


HeidelTime 


0.96 


0.85 


KUL 


0.91 


0.55 


TERSEO 


0.98 


0.65 


TipScm 


0.92 


0.65 


TRIOS 


0.94 


0.76 


TRIOS extension 


0.95 


0.86 



Table 3: Results obtained from TempEval-2 test set. 



type value 

TRIOS 0.8572 0.6257 

TRIOS extension 0.8853 0.7170 



Table 4: Results obtained from the corpus. 

extension outperforms each system in the normalisa- 
tion of VALUE attributes and performs competitively 
in the normalisation of type attributes. 

The table already shows that the normalisation of 
value attributes is slightly harder than that of type 
attributes. The extension of TRIOS outperformed the 
original system of 2.81% for type attribute and 9.13% 
for value attribute. 

I randomly sub-sampled (400 temporal expressions) 
the original corpus 10 times and I measured the per- 
formances with TRIOS and my extension. I con- 
ducted a statistical analysis on the results and I 
proved that the difference is statistically significant 
(Willcoxon test), respectively p = 0.00586 and p = 
0.0001621. 

The normalisation results with respect to the new 
corpus are shown in Table 2] 

4.2 Error analysis 

The original TRIOS normaliser made 1023 value mis- 
takes and 402 type mistakes while its extension re- 
spectively made 779 and 323. Through an accurate 
analysis of the errors, I found plenty of human anno- 
tations that seemed to be wrong at first impression. 
Once I analysed the same annotations taking into ac- 
count the entire sentence from which each expression 
had been extracted, I found that the human annota- 
tions were actually right. Some examples are shown 
in Table [5] 

This leads to the conclusion that further improve- 
ments are possible only if I consider also the resolution 
of anaphoric expressions. To do this, it will be nec- 
essary to consider a wider window for each temporal 
expression that takes into account at least the entire 



5 Conclusions 

I introduced a new rule-based normaliser of tempo- 
ral expressions and I showed that it resulted in better 
performances than the current state-of-the-art system 
with respect to TempEval-2 Shared Task. I also illus- 
trated the corpus of temporal expressions for normal- 
isation and its purpose. I made both, the normaliser 
and the corpus, freely available on-line (GNU public 
licence apply). 

5.1 Future work 

The work presented in this report is the product of 
a preliminary study in the field of information ex- 
traction. The results presented in this report clearly 
show the necessity of coping with anaphoric temporal 
expression to substantially enhance the performances 
of normalisation phase. Currently, the normalisation 
task takes into account only the temporal expressions, 
without considering a wider window, such as the en- 
tire sentence or a pre-defined number of words after 
and before the expression. This is required in order 
to cope with anaphoric expressions. 

My long-term goal is to develop novel temporal ex- 
pressions extraction techniques and use them in clini- 
cal domain. Because of the lack of pre-annotated clin- 
ical data, I will explore the use of semi-supervised ma- 
chine learning approaches for the identification phase. 

5.2 Acknowledgements 

I would like to thank Naushad UzZaman from the 
University of Rochester to have shared his normaliser 
with the scientific community. I would also like to ac- 
knowledge the support of UK Engineering and Phys- 
ical Science Research Council in the form of doctoral 
training grant. 

References 

[1] D. Ahn, S. F. Adafre, and M. de Rijke. To- 
wards task-based temporal extraction and recog- 
nition. In G. Katz, J. Pustejovsky, and 
F. Schildcr, editors, Annotating, Extracting and 
Reasoning about Time and Events, number 05151 
in Dagstuhl Seminar Proceedings, Dagstuhl, 
Germany, 2005. Internationales Begegnungs- 
und Forschungszentrum fur Informatik (IBFI), 
Schloss Dagstuhl, Germany. 



5 





human 


system 


25 


1999-04-25 


n/a 


last year 


1988-Q2 


1988 


three years before 


FUTURE.REF 


PAST.REF 


the summer of 1862 


FUTURE.REF 


1862-SU 


the weekend 


P2D 


PRESENT.REF 



Table 5: Some errors made by the normaliser. 
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